-
-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add naming new naming convention to docs #2874
Add naming new naming convention to docs #2874
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## rename-core-assets #2874 +/- ##
==================================================
Coverage 88.6% 88.6%
==================================================
Files 91 91
Lines 11019 11021 +2
==================================================
+ Hits 9771 9773 +2
Misses 1248 1248
☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
High level comments:
- Don't want to duplicate too much information between the intro page and the data naming conventions page.
- I made suggestions to the naming conventions page before the introductions page, so any stuff fixed there should also be fixed in the introductions page too (or de-duplicated).
- Worth thinking about the purpose of each of the docs pages. Docs are starting to get a little bit like a maze! (Not your fault or due to this PR specifically). We will probably address the bulk of this if/when we get the ComDev grant, but worth thinking about a little now. Specifically, the difference between Intro and Data Access.
- Need to be careful to clarify what is in
pudl.sqlite
and what is not. Also centerpudl.sqlite
as the main resource.
docs/data_access.rst
Outdated
PUDL's primary data output is the ``pudl.sqlite`` database. It contains a collection | ||
of tables that follow :ref:`PUDL's asset naming convention <asset-naming>`. Tables | ||
with the ``core_`` prefix are normalized tables that serve as building blocks for the | ||
more denormalized and easy to work with ``output_`` tables. **We recommend only working | ||
with ``output_`` tables.** | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should consider the fact that many users may not know what normalized and denormalized data means in this context. It might make sense to get rid of the sentence
Tables with the ``core_`` prefix are normalized tables that serve as building blocks for the more denormalized and easy to work with ``output_`` tables.
and just say
We recommend working with tables with the ``output_`` prefix as these tables contain the most complete data. For more information about the different types of tables, read through the naming conventions.
Or something like that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! I think we are getting there. Just a few more comments about slimming stuff down and removing duplicate information.
are stored in a data warehouse as a collection of SQLite and Parquet files so that | ||
users can access the data without having to run any code. Learn more about how to | ||
access the data `here <https://catalystcoop-pudl.readthedocs.io/en/dev/data_access.html>`__. | ||
|
||
What data is available? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment in intro page regarding data sources
I think the root issue of our docs maze is it’s not clear where the starting point is for users and contributors. Is it:
Should the starting point be different for contributors and users? |
Yeah.....this is so true! Is it possible to not have the README be the first page of the docs? That might help with the confusion if they were more seperate. |
Yes it is possible to not include the README in the docs which might be a good option. |
Ok! I made a few changes:
|
Hooray @bendnorman this looks great! My only suggestion is teeny tiny and it's to add a little caveat that CEMS data is stored in parquet files and not the sqlite db because it's so big. I'm referring to the spot on the into (now index) page where it says:
|
Wahoo! Made a change that explains larger datasets live in parquet files. |
…lease-notes Add naming convention change to release notes
PR Overview
This PR adds the new asset/table naming convention laid out in this design doc to the PUDL documents.
Changes in this PR:
out_
tables.docs/dev/data_guidelines.rst
that aren't relevant anymore.docs/dev/naming_conventions.rst
docs/intro.rst
with new raw, core, output layers.PR Checklist
dev
).